In [ ]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Deep Learning Design Patterns - Code Labs

Lab Exercise #2 - Get Familiar with Convolutional Neural Networks (CNN)

Prerequistes:

1. Familiar with Python
2. Completed Chapter II: Convolutional Neural Networks

Objectives:

1. Create a basic CNN.
2. Create a VGG class CNN
3. Create a CNN with an identity link (Residual CNN)

Basic CNN as Sequential API

Let's create a basic CNN. We will make it as two convolutional layers, each followed by a max pooling layer.

We will use these approaches:

1. We will double the number of filters with each subsequent layer.
2. We will reduce the size of the feature maps by using a stride > 1.

You fill in the blanks (replace the ??), make sure it passes the Python interpreter, and then verify it's correctness with the summary output.

You will need to:

1. Set the number of channels on the input vector (i.e., input shape).
2. Set the number of filters and stride on the convolutional layers.
3. Set the max pooling window size and stride.

In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense

# Let's start with a Sequential model
model = Sequential()

# Let's assume we are building a model for CIFAR-10, which are 32x32 RGB images
# HINT: how many channels are in an RGB image
input_shape=(32, 32, ??)

# Let's add a first convolution layer with 16 filters of size 3x3 and stride of 2
# HINT: first parameter is the number of filters and the second is the filter (kernel) size
model.add(Conv2D(??, ??, strides=2, activation='relu', input_shape=input_shape))

# Let's reduce the feature maps by 75%
# HINT: 2x2 window and move 2 pixels at a time
model.add(MaxPooling2D(??, strides=??))

# Let's add a second convolution layer with 3x3 filter and strides=2 and double the filters
# HINT: double the number of filters you specified in the first Conv2D
model.add(Conv2D(??, ??, strides=2, activation='relu'))

# Let's reduce the feature maps by 75%
model.add(MaxPooling2D(??, strides=??))

model.add(Dense(10, activation='softmax'))

Verify the model architecture using summary method

It should look like below:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 15, 15, 16)        448       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 16)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 3, 32)          4640      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 1, 1, 32)          0         
_________________________________________________________________
dense_3 (Dense)              (None, 1, 1, 10)          330       
=================================================================
Total params: 5,418
Trainable params: 5,418
Non-trainable params: 0

In [ ]:
model.summary()

VGG16 as Sequential API

Next, we will create a VGG convolutional network. VGG networks are sequential, but they add the concept of convolutional groups. The basic elements of a VGG are:

1. Each convolutional group consists of two or more convolutional layers.
2. Max pooling is deferred to the end of the convolutional group.
3. Each convolutional group is the same or double the number of filters as the last  
   group.
4. Multiple dense layers are used for the classifer.

You will need to:

1. Set the number of filers ,filter size and padding on the stem convolutional group.
2. Set the the number of filters for the convolutional blocks.
3. Add the flattening layer between the feature learning and classifier groups.
4. Set the number of nodes in the dense layers of the classifier.

In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

def conv_block(n_layers, n_filters):
    """
        n_layers : number of convolutional layers
        n_filters: number of filters
    """
    for n in range(n_layers):
        model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
                  activation="relu"))
    model.add(MaxPooling2D(2, strides=2))

# Create a Sequential Model
model = Sequential()

# Add Convolutional Frontend with 64 3x3 filters of stride 1
# Set the padding so when the filter is slid over the edges of the image, the "imaginary" pixels have the same
# value as the pixels on the edge.
model.add(Conv2D(??, ??, strides=(1, 1), padding=??, activation="relu",
          input_shape=(224, 224, 3)))


# These are the convolutional groups - double the number of filters on each progressive group
conv_block(1, 64)
conv_block(2, ??)
conv_block(3, ??)

# The last two groups in a VGG16, its double the size of the previous of the group, but both groups are the same size.
# HINT: the number should be the same for both
conv_block(3, ??)
conv_block(3, ??)

# Add layer to transistion from final 2D feature maps (bottleneck layer) to 1D vector for DNN.
# HINT: think of what you need to do to the 2D feature maps from the convolutional layers before passing to dense layers.
model.add(??)
# Add DNN Backend with two layers of 4096 nodes
# HINT: 
model.add(Dense(??, activation='relu'))
model.add(Dense(??, activation='relu'))

# Output layer for classification (1000 classes)
model.add(Dense(1000, activation=??))

Verify the model architecture using summary method

It should look like below:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_14 (Conv2D)           (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 56, 56, 256)       295168    
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 56, 56, 256)       590080    
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 56, 56, 256)       590080    
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 28, 28, 256)       0         
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 28, 28, 512)       1180160   
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 14, 14, 512)       0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_4 (Dense)              (None, 4096)              102764544 
_________________________________________________________________
dense_5 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_6 (Dense)              (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
__________________________

In [ ]:
model.summary()

Residual CNN as Functional API

Finally, we will create a residual convolutional network (ResNet). The basic elements of a ResNet are:

1. A stem convolutional group of 7x7 filter size.
2. A sequence of residual blocks, where each doubles the number of filters.
    A. Each residual block consists of two 3x3 filters, w/o max pooling.
    B. The input to the residual block is added to the output.
3. Between residual blocks is a convolutional block that doubles the number of 
   filters from the previous block, so the number of filters coming in and going 
   out are the same for the identity link matrix add operation.
    A. Each convolutional block consists of two 3x3 filters, but uses stride=2 
       to downsample the size of the feature maps.

You will need to:

1. Save the input to the residual block for the identity link.
2. Complete the matrix add of the identity link to the output of the residual block.
3. Set (double) the filters for the convolutional block between residual block groups to match filter sizes for matrix add operations.
4. Add the global averaging layer between the feature learning groups and the classifier.

In [ ]:
from tensorflow.keras import Model
import tensorflow.keras.layers as layers

def residual_block(n_filters, x):
    """ Create a Residual Block of Convolutions
        n_filters: number of filters
        x        : input into the block
    """
    # Save the input as the shortcut for the identity link
    # Hint: read the comment on the params to the function.
    shortcut = ??
    x = layers.Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
                      activation="relu")(x)
    x = layers.Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
                      activation="relu")(x)
    # Add the saved input (identity link) to the output.
    # HINT: the name of the variable you used above to save the input.
    x = layers.add([??, x])
    return x

def conv_block(n_filters, x):
    """ Create Block of Convolutions without Pooling
        n_filters: number of filters
        x        : input into the block
    """
    x = layers.Conv2D(n_filters, (3, 3), strides=(2, 2), padding="same",
                      activation="relu")(x)
    x = layers.Conv2D(n_filters, (3, 3), strides=(2, 2), padding="same",
                      activation="relu")(x)
    return x

# The input tensor
inputs = layers.Input(shape=(224, 224, 3))

# First Convolutional layer, where pooled feature maps will be reduced by 75%
x = layers.Conv2D(64, kernel_size=(7, 7), strides=(2, 2), padding='same', activation='relu')(inputs)
x = layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)

# First Residual Block Group of 64 filters
for _ in range(3):
    x = residual_block(64, x)

# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
# HINT: number should be twice as big as the number of filters in prior residual_blocks.
x = conv_block(??, x)

# Second Residual Block Group of 128 filters
for _ in range(3):
    x = residual_block(128, x)

# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block(??, x)

# Third Residual Block Group of 256 filters
for _ in range(5):
    x = residual_block(256, x)

# Double the size of filters and reduce feature maps by 75% (strides=2, 2) to fit the next Residual Group
x = conv_block(??, x)

# Fourth Residual Block Group of 512 filters
for _ in range(2):
    x = residual_block(??, x)

# Add a Global Averaging Pooling (inplace of a Flatten) at the end of all the convolutional residual blocks
x = layers.??()(x)

# Final Dense Outputting Layer for 1000 outputs
outputs = layers.Dense(1000, activation='softmax')(x)

model = Model(inputs, outputs)

Verify the model architecture using summary method

It should look like below:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 112, 112, 64) 9472        input_1[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 56, 56, 64)   0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 56, 56, 64)   36928       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 56, 56, 64)   36928       conv2d_2[0][0]                   
__________________________________________________________________________________________________
add_1 (Add)                     (None, 56, 56, 64)   0           max_pooling2d_1[0][0]            
                                                                 conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 56, 56, 64)   36928       add_1[0][0]                      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 56, 56, 64)   36928       conv2d_4[0][0]                   
__________________________________________________________________________________________________
add_2 (Add)                     (None, 56, 56, 64)   0           add_1[0][0]                      
                                                                 conv2d_5[0][0]                   
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 56, 56, 64)   36928       add_2[0][0]                      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 56, 56, 64)   36928       conv2d_6[0][0]                   
__________________________________________________________________________________________________
add_3 (Add)                     (None, 56, 56, 64)   0           add_2[0][0]                      
                                                                 conv2d_7[0][0]                   
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 28, 28, 128)  73856       add_3[0][0]                      
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 14, 14, 128)  147584      conv2d_8[0][0]                   
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 14, 14, 128)  147584      conv2d_9[0][0]                   
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 14, 14, 128)  147584      conv2d_10[0][0]                  
__________________________________________________________________________________________________
add_4 (Add)                     (None, 14, 14, 128)  0           conv2d_9[0][0]                   
                                                                 conv2d_11[0][0]                  
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 14, 14, 128)  147584      add_4[0][0]                      
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 14, 14, 128)  147584      conv2d_12[0][0]                  
__________________________________________________________________________________________________
add_5 (Add)                     (None, 14, 14, 128)  0           add_4[0][0]                      
                                                                 conv2d_13[0][0]                  
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 14, 14, 128)  147584      add_5[0][0]                      
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 14, 14, 128)  147584      conv2d_14[0][0]                  
__________________________________________________________________________________________________
add_6 (Add)                     (None, 14, 14, 128)  0           add_5[0][0]                      
                                                                 conv2d_15[0][0]                  
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 7, 7, 256)    295168      add_6[0][0]                      
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_16[0][0]                  
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_17[0][0]                  
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_18[0][0]                  
__________________________________________________________________________________________________
add_7 (Add)                     (None, 4, 4, 256)    0           conv2d_17[0][0]                  
                                                                 conv2d_19[0][0]                  
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 4, 4, 256)    590080      add_7[0][0]                      
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_20[0][0]                  
__________________________________________________________________________________________________
add_8 (Add)                     (None, 4, 4, 256)    0           add_7[0][0]                      
                                                                 conv2d_21[0][0]                  
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 4, 4, 256)    590080      add_8[0][0]                      
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_22[0][0]                  
__________________________________________________________________________________________________
add_9 (Add)                     (None, 4, 4, 256)    0           add_8[0][0]                      
                                                                 conv2d_23[0][0]                  
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 4, 4, 256)    590080      add_9[0][0]                      
__________________________________________________________________________________________________
conv2d_25 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_24[0][0]                  
__________________________________________________________________________________________________
add_10 (Add)                    (None, 4, 4, 256)    0           add_9[0][0]                      
                                                                 conv2d_25[0][0]                  
__________________________________________________________________________________________________
conv2d_26 (Conv2D)              (None, 4, 4, 256)    590080      add_10[0][0]                     
__________________________________________________________________________________________________
conv2d_27 (Conv2D)              (None, 4, 4, 256)    590080      conv2d_26[0][0]                  
__________________________________________________________________________________________________
add_11 (Add)                    (None, 4, 4, 256)    0           add_10[0][0]                     
                                                                 conv2d_27[0][0]                  
__________________________________________________________________________________________________
conv2d_28 (Conv2D)              (None, 2, 2, 512)    1180160     add_11[0][0]                     
__________________________________________________________________________________________________
conv2d_29 (Conv2D)              (None, 1, 1, 512)    2359808     conv2d_28[0][0]                  
__________________________________________________________________________________________________
conv2d_30 (Conv2D)              (None, 1, 1, 512)    2359808     conv2d_29[0][0]                  
__________________________________________________________________________________________________
conv2d_31 (Conv2D)              (None, 1, 1, 512)    2359808     conv2d_30[0][0]                  
__________________________________________________________________________________________________
add_12 (Add)                    (None, 1, 1, 512)    0           conv2d_29[0][0]                  
                                                                 conv2d_31[0][0]                  
__________________________________________________________________________________________________
conv2d_32 (Conv2D)              (None, 1, 1, 512)    2359808     add_12[0][0]                     
__________________________________________________________________________________________________
conv2d_33 (Conv2D)              (None, 1, 1, 512)    2359808     conv2d_32[0][0]                  
__________________________________________________________________________________________________
add_13 (Add)                    (None, 1, 1, 512)    0           add_12[0][0]                     
                                                                 conv2d_33[0][0]                  
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 512)          0           add_13[0][0]                     
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1000)         513000      global_average_pooling2d_1[0][0] 
==================================================================================================
Total params: 21,616,232
Trainable params: 21,616,232
Non-trainable params: 0

In [ ]:
model.summary()

Training

Next, we will train two mini-VGGs (6 and 10 layers) on the CIFAR-10 dataset and compare the results. As we have not covered data preprocessing or training, just following the steps.

VGG (6)

Let's make a 6 layer VGG.


In [ ]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
def makeVGG6():
    def conv_block(n_layers, n_filters):
        """
            n_layers : number of convolutional layers
            n_filters: number of filters
        """
        for n in range(n_layers):
            model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
                      activation="relu"))
        model.add(MaxPooling2D(2, strides=2))
        
    model = Sequential()
    model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation="relu",
              input_shape=(32, 32, 3)))

    # These are the convolutional groups
    conv_block(1, 64)
    conv_block(2, 128)
    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
    return model

vgg6 = makeVGG6()

Let's now check the summary(). You should see 34 million parameters.


In [ ]:
vgg6.summary()

VGG(10)

Let's now make a 10 layer VGG.


In [ ]:
def makeVGG10():
    def conv_block(n_layers, n_filters):
        """
            n_layers : number of convolutional layers
            n_filters: number of filters
        """
        for n in range(n_layers):
            model.add(Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same",
                      activation="relu"))
        model.add(MaxPooling2D(2, strides=2))
        
    model = Sequential()
    model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation="relu",
              input_shape=(32, 32, 3)))

    # These are the convolutional groups
    conv_block(1, 64)
    conv_block(2, 128)
    conv_block(3, 256)
    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dense(4096, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
    return model

vgg10 = makeVGG10()

Let's now check the summary(). You should see 35 million parameters. Note how there have nearly the same number of parameters, but the 10 layer VGG is deeper.


In [ ]:
vgg10.summary()

Dataset

Let's get the tf.Keras builtin dataset for CIFAR-10. These are 32x32 color images (3 channels) of 10 classes (airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks). We will preprocess the image data (not covered yet).


In [ ]:
from tensorflow.keras.datasets import cifar10
import numpy as np

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)

Results

Let's train both the 6 and 10 layer VGG on CIFAR-10 for 3 epochs and compare the results.


In [ ]:
vgg6.fit(x_train, y_train, epochs=3, batch_size=32, validation_split=0.1, verbose=1)

In [ ]:
vgg10.fit(x_train, y_train, epochs=3, batch_size=32, validation_split=0.1, verbose=1)

Observation

Notice how the shallower VGG (6 layers) increasesn in accuracy across all three epochs (??), but the deeper VGG (10) does not and in fact it learns nothing (10% is same as random guessing).

While this is not a vanishing gradient (we do not see a NaN on the loss), it does show how early CNN architectures when made deeper became less reliable to converge - not covered yet.

If we use a larger image size, like 224x224, we can go more layers because we have more pixel data, but eventually we hit the same problem again.

End of Lab Exercise